SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents
نویسندگان
چکیده
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency. 2009 Elsevier Inc. All rights reserved.
منابع مشابه
An Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملAnswering Tag-Term Keyword Queries over XML Documents in DHT Networks
The emergence of Peer-to-Peer (P2P) computing model and the popularity of Extensible Markup Language (XML) as the web data format have fueled the extensive research on retrieving XML data in P2P networks. In this paper, we developed an efficient and effective keyword search framework that can support tag-term keyword queries in Distributed Hash Table (DHT) networks. We employed a concise Bloom-...
متن کاملContent-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this e...
متن کاملA Survey on Keyword Diversification Over XML Data
Keyword queries are those terms that users enter and use to retrieve documents that have all or any of those terms. They are the most familiar and popular method used by ordinary users to search data. Keyword queries are highly ambiguous. Keyword search querying has emerged as one of the most effective way for information discovery, especially over HTML documents in the World Wide Web. Because ...
متن کاملAdaptive Partitioned Indexes for Efficient XML Keyword Search
1. INTRODUCTION Keyword search, which is extensively used for searches over flat HTML documents on the web, is a simple and effective paradigm for information discovery. have studied how to effectively apply this useful paradigm to searches over XML documents. XML Keyword search makes it possible for users to obtain relevant information without having to know complex query syntaxes (
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 179 شماره
صفحات -
تاریخ انتشار 2009